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BACKGROUND OF THE INVENTION 



1. Field of the Invention 

5 This invention relates generally to the field of computer graphics and, more 

particularly, to a graphics system which uses spatial dithering to compensate for a loss of 
sample precision due to buffering and transmission through data interfaces. 

2. Description of the Related Art 

10 

High-quality anti-aliasing involves the generation of a set of samples, and filtering 
the samples (with an anti-aliasing filter) to generate pixels. It would be desirable if 
industry standard graphics cards could be used to form a graphics system capable of 
performing high-quality anti-aliasing. However, industry standard graphics cards 
15 generally end up throwing away one or more bits of computed color precision in the 
process of buffering color values in their internal frame buffers and outputting the color 
values through data interfaces (such as the Digital Video Interface). There exists a need 
for a graphics system and methodology capable of compensating for this loss of color 
precision. 

20 
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SUMMARY 

< 

A graphics system may be configured with a set of graphics accelerators (e.g., 
industry standard graphics accelerators) and a series of filtering units. Each of the 
5 graphics accelerators may couple to a corresponding one of the filtering units. Each of the 
graphics accelerators may be configured to (a) generate a stream of samples in response 
to received graphics primitives, (b) add a corresponding dither value to the color 
components of the samples to obtain dithered color components, (c) buffer the dithered 
color components in an internal frame buffer, and (d) forward truncated versions of the 
10 dithered color components to the corresponding filtering unit. The filtering units may be 
configured to perform a weighted averaging computation on the truncated dithered color 
components to determine pixel color components. 

A host computer may broadcast a stream of graphics primitives to the set of 
graphics accelerators. Thus, each of the graphics accelerators may receive the same set of 
1 5 graphics primitives. 

The dither values corresponding to the set of graphics accelerators may have an 
average value of V%. When the dither value is added to a sample color component, the 
ones digit of the dither value is aligned with the least significant bit of the sample color 
component that is to survive truncation. 

20 Each of the filtering units is configured to support the weighted averaging 

computation by computing a partial sums (one partial sum for each of red, green and 
blue) corresponding to a subset of the samples falling in a filter support region. The 
filtering units are configured to add the partial sums in a pipelined fashion. A last of the 
filtering units in the series may be programmably configured to normalize a set of final 

25 cumulative sums resulting from said addition of the partial sums in a pipelined fashion. 

In another set of embodiments, a graphics system may be configured to include a 
set of rendering processors and a series of filtering units. Each of the rendering 
processors couples to a corresponding one of the filtering units. Each rendering processor 
RP(K) of the set of rendering processors may be configured to (a) generate a stream of 
30 samples in response to received graphics primitives, (b) add a dither value D K to a data 
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component (such as red, green, blue or alpha) of each the samples in the stream to obtain 
dithered data components, (c) buffer the dithered data components in an internal frame 
buffer, and (d) forward a truncated version of the dithered data components to the 
corresponding filtering unit. The filtering units are configured to perform a weighted 
5 averaging computation on the truncated dithered data components to determine pixel data 
components. 

The rendering processors reside within original equipment manufacturer (OEM) 
graphics cards (i.e., graphics accelerators). Each of the graphics cards may contain one or 
more of the rendering processors. 

10 
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BRIEF DESCRIPTION OF THE DRAWINGS 



A better understanding of the present invention can be obtained when the 
following detailed description is considered in conjunction with the following drawings, 
in which: 

Figure 1 A illustrates a collection Csp of sample positions in a virtual screen space 
according to one set of embodiments; 

Figure IB illustrates one embodiment of a subset T K of sample positions used by 
a corresponding graphics card GC(K); 

Figure 2 illustrates one embodiment for a set of sample positions generated by a 
set of graphics cards in a given sample bin in virtual screen space; 

Figure 3 illustrates one embodiment of a graphics system including a set of 
industry standard graphics cards and a series of filtering units; 

Figure 4 illustrates one embodiment of a set of virtual pixel centers generated by a 
filtering unit FU(K) in a virtual screen space; 

Figure 5 illustrates one embodiment of a sample filtration computation used to 
determine pixel values; 

Figure 6 illustrates a portion of the sample filtration computation handled by a 
single filtering unit according to one embodiment; 

Figure 7 highlights the frame buffer FB(K) and video data port VDP(K) of 
graphics card GC(K) according to one set of embodiments; 

Figure 8 presents a tabulated example of a spatial dithering process in a box 
filtering mode; 

Figure 9 illustrates one embodiment of a graphics system including a set of 
graphics cards and a series of filtering units, where each of the graphics cards 
contains two rendering processors; 
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Figure 10 illustrates a methodology for applying spatial dithering to sample data 
components in a graphics system. 



While the invention is susceptible to various modifications and alternative forms, 
5 specific embodiments thereof are shown by way of example in the drawings and will 
herein be described in detail. It should be understood, however, that the drawings and 
detailed description thereto are not intended to limit the invention to the particular form 
disclosed, but on the contrary, the intention is to cover all modifications, equivalents, and 
alternatives falling within the spirit and scope of the present invention as defined by the 

10 appended claims. Note, the headings are for organizational purposes only and are not 
meant to be used to limit or interpret the description or claims. Furthermore, note that the 
word "may" is used throughout this application in a permissive sense (i.e., having the 
potential to, being able to), not a mandatory sense (i.e., must)." The term "include", and 
derivations thereof, mean "including, but not limited to". The term "connected" means 

15 "directly or indirectly connected", and the term "coupled" means "directly or indirectly 
connected". 
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DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 



This detailed description discloses various embodiments of a graphics system 
architecture that uses (a) a set Sec of standard OEM graphics cards to generate samples 
5 and (b) a series Sfu of filtering units coupled to the OEM graphics cards. The series of 
filtering units receive samples from the OEM graphics cards and spatially filter the 
samples to generate video output pixels. OEM is an acronym for "original equipment 
manufacturer". 

Let N G c denote the number of OEM graphics cards in the set S G c. The OEM 
10 graphics cards of the set S G c are denoted as GC(0), GC(1), GC(2), GC(N G c-l). The 
number N G c is a positive integer. 

Let N F u denote the number of filtering units in the series Sfu- The filtering units 
of the series S F u are denoted FU(0), FU(1), FU(2), FU(N FU -1). The number N FU is a 
positive integer. 

15 A host computer directs the broadcast of graphics data from host memory (i.e., a 

memory associated with the host computer) to the OEM graphics cards GC(0), GC(1), 
GC(2), GC(N G c-l). The graphics data may specify primitives such as polygons, lines 
and dots. 

The graphics cards collaboratively generate samples corresponding to a collection 
20 Csp of sample positions in a virtual screen space as suggested by Figure 1A. Virtual 
screen space may be interpreted as being partitioned into an array of bins, each bin being 
a lxl square. Let X v and Y v denote the coordinates of virtual screen space. The 
boundaries of the bins may be defined by lines of the form "X v equal to an integer" and 
lines of the form "Y v equal to an integer". Each graphics card GC(K) may generate 
25 samples corresponding to a subset Tk of the collection Csp of sample positions. Each 
sample includes a set of data components such as red, green and blue color components. 
A sample may also include data components such as depth and/or alpha, blur value, 
brighter- than-bright value, etc. 

It is noted that values N B =6 and M B =7 for the sizes of the spatial bin array have 
30 been chosen for the sake of illustration, and are much smaller than would typically be 
used in most applications. 
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In one set of embodiments, graphics card GC(K), K=0, 1, 2, Ngc-1, may 
generate samples for sample positions of the rectangular array 

Tk= {(I,J) + (Vx(K),V y (K)):I=0 5 1,2, ...,M B (K)-1, J=0, 1,2, ...,N B (K)-1} 
as suggested by Figure IB. V X (K) and V Y (K) represent the horizontal and vertical 
5 displacements of rectangular array T K from the origin of virtual screen space. M B (K) is 
the horizontal array resolution, and N B (K) is the vertical array resolution. Each graphics 
card GC(K) may include programmable registers which store the values V X (K), V Y (K), 
M B (K)and N B (K). 

Host software may set all the vertical resolutions V Y (K) to a common value V Y , 

10 set all the horizontal resolutions M B (K) equal to a common value M B , and set the vector 
displacements (V X (K),V Y (K)), K=0, 1,2, N G c-l, so that they reside in the unit square 
U={(x,y): 0 < x,y <1}. In particular, the vector displacements may be set so that they 
attain Ngc distinct positions within the unit square, e.g., positions that are uniformly 
distributed (or approximately uniformly distributed) over the unit square. Thus, for every 

15 I in the range 0, 1, 2, M B -1, and every J in the range 0, 1, 2, N B -1, the bin at bin 
address (I,J) contains a sample position Qk(IJ)=(IJ) + (Vx(K),V y (K)) of the array T K , 
K=0, 1,2, . . ., Ngc-1, as suggested by Figure 2 in the case Ngc = 16. 

The host computer may direct the broadcast of a stream of primitives 
corresponding to an animation frame to all the graphics cards. Each graphics card GC(K) 

20 may receive the stream of primitives, compute (i.e., render) samples at the sample 
positions of the rectangular array T K based on the received primitives, temporarily store 
the samples in an internal frame buffer, and then, forward the samples from the internal 
frame buffer to a corresponding one of the filtering units. The samples (i.e., the color 
components of the samples) may experience a loss of precision in the process of 

25 buffering and forwarding. The sample rendering hardware in graphics card GC(K) may 
compute each sample color components with Pi bits of precision. However, the frame 
buffer may be configured to store each sample color component with P2 bits of precision, 
where P 2 is smaller than Pi. Thus, sample color components experience a loss of (P2-P1) 
bits of precision in the process of being stored into the internal frame buffer. 

30 It is noted that graphics cards typically allow the displacement values V X (K) and 

V Y (K) to take values over a wider range than merely the interval [0,1). Thus, the 
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example given above and the configurations shown in Figures 1 A, IB and 2 are meant to 
be suggestive and not limiting. Some or all of the vector displacements (V X (K),V Y (K)), 
K=0, 1,2,..., N G c-l> may be assigned positions outside the unit square. 

In one set of embodiments, a graphics system may be configured with one 
5 filtering unit per graphics card (i.e., Ngc=Nfu) as suggested by Figure 3. Each filtering 
unit FU(K) receives a stream H K of samples from the corresponding graphics card 
GC(K). The stream H K contains samples computed on the subset T K of sample positions. 

As suggested by Figure 4, each filtering unit FU(K) may scan through virtual 
screen space in raster fashion generating virtual pixel centers denoted by the small plus 

10 markers, and generating a set of partial sums (e.g., one partial sum for each color plus a 
partial sum for filter coefficients) at each of the virtual pixel centers based on one or more 
samples from the stream H K in the neighborhood of the virtual pixel center. Recall that 
samples of the stream Hk correspond to samples positions of the subset Tk. These 
sample positions are denoted as small circles in Figure 4. (The virtual pixel centers are 

15 also referred to as filter centers or convolutions centers.) The filtering units are coupled in 
a series to facilitate the pipelined accumulation of the sets of partial sums for each video 
output pixel. 

The array A P c(K) of virtual pixel centers traversed by the filtering unit FU(K) 
may be characterized by the following programmable parameters: a horizontal spacing 

20 AX(K), a vertical spacing AY(K), a horizontal start displacement Xs ta rt(K), a vertical start 
displacement Y St art(K), a horizontal resolution N H (K) and a vertical resolution N V (K). 
The horizontal resolution N H (K) is the number of virtual pixel centers in a horizontal line 
of the array A P c(K). The vertical resolution Nv(K) is the number of virtual pixel centers 
in the vertical line of the array A PC (K). Filtering unit FU(K) includes registers for 

25 programmably setting the parameters AX(K), AY(K), X St art(K), Y Sta rt(K), N H (K) and 
N V (K). Host software may set these parameters so that the arrays A P c(K), K=0, 1, 2, 
Nfu-1 5 are identical. 

The filtering units collaborate to compute a video pixel P at a particular virtual 
pixel center as suggested by Figure 5. The video pixel is computed based on a filtration 
30 of samples corresponding to sample positions (of the collection Cs P ) within a support 
. region centered on (or defined by) the virtual pixel center. Each filtering unit FU(K) 



5681-59200 



Meyertons, Hood, Kivlin, Kowert & Goetzel, P.C. 



performs a respective portion of the sample filtration. (The sample positions falling 
within the support region are denoted as small black dots. In contrast, sample positions 
outside the support region are denoted as small circles.) Each sample S corresponding to 
a sample position Q within the support region may be assigned a filter coefficient Cs 
5 based on the sample position Q. The filter coefficient Cs may be function of the spatial 
displacement between the sample position Q and the virtual pixel center. In some 
embodiments, the filter coefficient Cs is a function of the radial distance between the 
sample position Q and the virtual pixel center. 

Each of the color components (r P , g P , bp) of the video pixel may be determined by 
10 computing a weighted summation of the corresponding sample color components of the 
samples falling inside the filter support region. (A sample is said to fall inside the filter 
support region when its corresponding sample position falls inside the filter support 
region.) For example, a red summation value r P for the video pixel P may be computed 
according to the expression 

15 r F =J^C s r st (1) 

where the summation ranges over each sample S in the filter support region, and where r s 
is the red sample value of the sample S. In other words, the red component of each 
sample S in the filter support region is multiplied by the corresponding filter coefficient 
Cs, and the resulting products are summed. Similar weighted summations are performed 
20 to determine a green summation value gp and a blue summation value bp for the video 
pixel P based respectively on the green color components gs and the blue color 
components bs of the samples: 

g P =T,C s g S9 (V) 

*i.=E c A. (i") 

25 Furthermore, a normalization value Ep (i.e., a coefficient summation value) may 

be computed by adding up the filter coefficients Cs for the samples S in the filter support 
region, i.e., 

E P ="Z C s- (2) 
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The summation values may then be multiplied by the reciprocal of E P (or equivalently, 
divided by E P ) to determine normalized pixel values: 

Rp=(l/E P )*r P (3) 

Gp=(l/E P )*gP (4) 
5 Bp=(l/E P )*b P (5) 

To implement the computation described above, each of the filtering units FU(K), 
K=0, 1, 2, N FU -1> may compute partial sums PS r (K), PS g (K), PS b (K) and PS E (K) for 
the video pixel P based on the samples from its input stream Hk that correspond to 
sample positions interior to the filter support as suggested by Figure 6. Filtering unit 
10 FU(K) may include (or couple to) an input buffer INB(K) (not shown) which serves to 
buffer N L b horizontal scan lines of samples from the input stream H K . The parameter 
N L b may be greater than or equal to the vertical bin size of a bin neighborhood containing 
the filter support. The bin neighborhood may be a rectangle (or square) of bins. For 
example, in one embodiment the bin neighborhood is a 5x5 array of bins centered on the 
15 bin which contains the virtual pixel position as suggested by Figure 6. 

The summation values r P , g P , bp and E P are developed cumulatively in the series 
of filtering units as follows. Each filtering unit FU(K), K=l, 2, 3, N FU -2, receives 
cumulative sums CS r (K-l), CS g (K-l), CS b (K-l) and CS E (K-1) from a previous filtering 
unit FU(K-l), adds its partial sums to the received cumulative sums respectively to form 
20 updated cumulative sums according to the relations 

CS r (K) = CS r (K-l) + PS r (K) 
CS g (K) = CS g (K-l) + PS g (K) 
CS b (K) = CS b (K-l) + PSb(K) 
CS E (K) = CS E (K-1) + PS E (K), 
25 and transmits the updated cumulative sums to the next filtering unit FU(K+1). 

The first filtering unit FU(0) assigns the values of the partial sums PS r (0), PS g (0), 
PSb(O) and PS E (0) to the cumulative sums CS r (0), CS g (0), CS b (0) and CS E (0) 
respectively, and transmits these cumulative sums to filtering unit FU(1). 

The last filtering unit FU(N F u-l) receives cumulative sums CS r (NFu-2), CS g (N FU - 
30 2), CS b (N FU -2) and CS E (N F u-2) from the previous filtering unit FU(N FU -2), adds the 
partial sums PS r (N FU -l), PS g (N FU -l)> PS b (N FU -l) and PS E (N FU -1) to the received 
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cumulative sums respectively to form final cumulative sums CS r (N F u-l), CS g (N F u-l), 
CS b (N F u-l) and CS E (N F u-l) according to the relations 

CS r (N FU -l) = CS r (N FU -2) + PS r (N FU -l) 
CS g (N FU -l) = CS g (N FU -2) + PS g (N FU -l) 
5 CS b (N FU -l) = CS b (N FU -2) + PS b (N FU -l) 

CS E (N FU -1) = CS E (N FU -2) + PS e (Nfu-1). 
These final cumulative sums are the summation values described above, i.e., 
r P = CS r (N FU -l) 
g P = CS g (N FU -l) 
10 b P = CS b (N FU -l) 

E P = CS e (Nfu-1). 

Furthermore, the last filtering unit FU(N F u-l) may be programmably configured to 
perform the normalizing computations to determine the color components Rp, Gp and B P 
of the video pixel P: 
15 Rp = (l/E P )*r P 

Gp = (l/E P )*gp 

Bp = (l/E P )*bp. 

In this fashion, the series of filtering units collaborate to generate each video pixel 
in a stream of video pixels. The series of filtering units may be driven by a common pixel 

20 clock. Furthermore, each filtering unit may transmit cumulative sums to the next filtering 
unit using source- synchronous signaling. The last filtering unit FU(N F u-l) may forward 
the stream of video pixels to a digital-to-analog (D/A) conversion device. The D/A 
conversion device converts the video pixel stream to an analog video signal and provides 
the analog video signal to an analog output port accessible by one or more display 

25 devices. Alternatively, the last filtering unit FU(N FL rl) may forward the video pixel 
stream to a digital video output port accessible by one or more display devices. 

The filter coefficient Cs for a sample S in the filter support region may be 
determined by a table lookup. For example, a radially symmetric filter may be realized by 
a filter coefficient table, which is addressed by a function of the sample's radial distance 
30 with respect to the virtual pixel center. The filter support for a radially symmetric filter 
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may be a circular disk as suggested by the example of Figure 6. (The support of a filter is 
the region in virtual screen space on which the filter is defined.) The terms "filter" and 
"kernel" are used as synonyms herein. Let Rf denote the radius of the circular support 
disk. 

5 Filtering unit FU(K) may access a sample S corresponding to the bin 

neighborhood from its input buffer INB(K). (Recall that the samples stored in the input 
buffer INB(K) are received from graphics card GC(K) and correspond to sample 
positions of the subset Tic). Filtering unit FU(K) may compute the square radius (Ds) 2 of 
the sample's position (Xs,Ys) with respect to the virtual pixel center (X P ,Y P ) according to 
10 the expression 

(D s ) 2 = (Xs-Xp) 2 + (Y s -Yp) 2 . 
The square radius (Ds) may be compared to the square radius (Rf) of the filter support. 
If the sample's square radius is less than (or, in a different embodiment, less than or equal 
to) the filter's square radius, the sample S may be marked as being valid (i.e., inside the 
15 filter support). Otherwise, the sample S may be marked as invalid. 

Filtering unit FU(K) may compute a normalized square radius Us for a valid 
sample S by multiplying the sample's square radius by the reciprocal of the filter's square 
radius: 

20 The normalized square radius Us may be used to access the filter coefficient table for the 
filter coefficient Cs- The filter coefficient table may store filter coefficients indexed by 
the normalized square radius. 

In various embodiments, the filter coefficient table is implemented in RAM and is 
programmable by host software. Thus, the filter function (i.e., the filter kernel) used in 

25 the filtering process may be changed as needed or desired. Similarly, the square radius 
(Rf) 2 of the filter support and the reciprocal square radius l/(Rf) 2 of the filter support may 
be programmable. Each filtering unit FU(K) may include its own filter coefficient table. 

Because the entries in the filter coefficient table are indexed according to 
normalized square distance, they need not be updated when the radius R f of the filter 
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support changes. The filter coefficients and the filter radius may be modified 
independently. 

In one embodiment, the filter coefficient table may be addressed with the sample 
radius D s at the expense of computing a square root of the square radius (D s ) 2 . In another 
5 embodiment, the square radius may be converted into a floating-point format, and the 
floating-point square radius may be used to address the filter coefficient table. It is noted 
that the filter coefficient table may be indexed by any of various radial distance measures. 
For example, an L 1 norm or L 00 norm may be used to measure the distance between a 
sample position and the virtual pixel center. 

10 Invalid samples may be assigned the value zero for their filter coefficients. Thus, 

the invalid samples end up making a null contribution to the pixel value summations. In 
other embodiments, filtering hardware internal to the filtering unit FU(K) may be 
configured to ignore invalid samples. Thus, in these embodiments, it is not necessary to 
assign filter coefficients to the invalid samples. 

15 In some embodiments, the filtering units may support multiple filtering modes. 

For example, in one collection of embodiments, each filtering unit supports a box 
filtering mode as well as a radially symmetric filtering mode. In the box filtering mode, 
the filtering units may implement a box filter over a rectangular support region, e.g., a 
square support region with radius R f (i.e. side length 2R f ). Thus, the filtering units may 

20 compute boundary coordinates for the support square according to the expressions Xp+Rf, 
Xp-Rf, Yp+R f , and Y P -Rf. Each sample S in the bin neighborhood may be marked as 
being valid if the sample's position (Xs,Ys) falls within the support square, i.e., if 

Xp-Rf <X S < Xp+Rf and 
Yp-R f <Y s < Yp+Rf. 

25 Otherwise the sample S may be marked as invalid. Each valid sample may be assigned 
the same filter weight value (e.g., Cs=l). It is noted that any or all of the strict 
inequalities (<) in the system above may be replaced with permissive inequalities (<). 
Various embodiments along these lines are contemplated. 

The filtering units may use any of a variety of filters either alone or in 

30 combination to compute pixel values from sample values. For example, the filtering 
units may use a box filter, a tent filter, a cone filter, a cylinder filter, a Gaussian filter, a 
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Catmull-Rom filter, a Mitchell-Netravali filter, a windowed sine filter, or in general, any 
form of band pass filter or any of various approximations to the sine filter. 

Loss of Sample Color Precision 
5 As described above, the filtering units FU(0), FU(1), FU(2), FU(N FU -1) 

collaborate to perform a spatial filtration (i.e., an averaging computation) on the color 
components (r$, gs, bs) of samples as received from the graphics cards in order to 
generate corresponding pixel color components (Rp, Gp, B P ). However, it is important to 
note that the precision of the sample color components as received by the filtering units 

10 may be smaller than the precision used in the graphics cards to originally compute the 
sample color components. 

Graphics card GC(K) includes an internal frame buffer FB(K) and a video data 
port VDP(K) through which the graphics card GC(K) is configured to output video data 
as suggested by Figure 7. The video data port VDP(K) is used to transfer the stream H K 

15 of samples (corresponding to sample positions of the subset Tk) to the filtering unit 
FU(K). Thus, filtering unit FU(K) couples to the video data port VDP(K). 

In some embodiments, graphics card GC(K) may compute the color components 
of samples (corresponding to sample positions of the subset Tk) at a first precision Pi, 
temporarily store the sample color components in frame buffer FB(K), and then forward 

20 the sample color components from the frame buffer FB(K) to filtering unit FU(K) 
through the video data port VDP(K). In the process of buffering and forwarding, the 
sample color components are truncated down to a second lower precision P 2 imposed by 
the frame buffer FB(K) and/or the video data port VDP(K). For example, a video data 
port conforming to the Digital Video Interface (DVI) specification may allow only 8 bits 

25 per color component, especially for video formats with video clock rate greater than 166 
MHz. 

Let Rs, Gs and B$ denote the color components of a sample S as computed by the 
graphics card GC(K) at the first precision Pi. Let Trn(R s ), Trn(G s ) and Trn(B s ) represent 
the truncated sample color components as sent to the filtering unit (through the sample 
30 stream H K ) at the second lower precision P2. Thus, the (P1-P2) least significant bits of X 
are missing from Trn(X), where X=Rs, Gs, Bs. 
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Filtering unit FU(K) computes its partial sums based on the truncated color 
components Trn(Rs), Trn(Gs) and Trn(B s ) received from the video data port VDP(K). In 
other words, the truncated color components Tm(Rs), Trn(Gs) and Trn(B s ) take the roles 
of r s , gs and bs respectively in the filtration computation described above. 
5 In another set of embodiments, instead of mere truncation, graphics card GC(K) 

may be configured to round the color components Rs, Gs and Bs of each computed 
sample S to the nearest state of the lower precision operand (i.e., the precision P2 
operand). Graphics card GC(K) may implement the rounding by adding w=14 to the color 
components Rs, Gs and B s prior to the truncating action of buffering and forwarding to 
10 the filtering unit FU(K). The value w=l/2 is aligned so that the ones bit position of 
operand w corresponds to the least significant bit of the color component Rs, Gs or B s 
that survives the truncation. In this case, filtering unit FU(K) computes its partial sums 
based on the rounded color components Trn(Rs+l/2), Trn(Gs+l/2) and Trn(Bs+l/2). The 
rounding algorithm induces less error on average than the pure truncation algorithm. 

15 

Dithering to Recover Added Precision 

In one set of embodiments, the set of graphics cards GC(0), GC(1), GC(N G c- 
1) may apply a spatial dithering operation to the sample color components prior to 
buffering and forwarding the sample color components to the filtering units. Each 
20 graphics card GC(K), K=0, 1, 2, Nqc-1, may be programmably configured to add a 
dither value D K to the sample color components Rs, Gs, Bs of each computed sample S to 
generate dithered sample color components Rs', Gs', Bs' according to the relations: 

Rs 1 = Rs+Dk 

G s ' = G s +Dk (6) 

25 Bs=Bs+D K . 

These additions may be implemented using a programmable pixel shader in the graphics 
card GC(K). (Any of a variety of modern OEM graphics cards include programmable 
pixel shaders.) The dithered color values Rs', Gs' and Bs 1 of the sample S are then 
buffered and forwarded to the filtering unit FU(K) through the video data port VDP(K) 

30 instead of the originally computed color values R s , Gs and B s . In the process of buffering 
and forwarding, the dithered color values Rs', Gs* and B s ' get truncated down to the lower 
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precision values Trn(Rs'), Trn(Gs') and Trn(B s '). It is these truncated values Trn(Rs'), 
Trn(Gs') and Trn(B s ') that are output from the video data port VDP(K) in the stream H K 
and received by the filtering unit FU(K). Thus, the filtering unit FU(K) computes its 
partial sums based on the truncated color values Trn(R s , ) 5 Trn(G s ') and Trn(Bs'). The 
5 series of filtering units FU(0), FU(1), FU(N F u-l) compute the color components of a 
video pixel P by accumulating the partial sums and normalizing the final sum as 
described above. The dithering operation allows the pixel color components computed by 
the series of filtering units to more closely approximate the ideal pixel color components 
which would be obtained from performing the same summation and normalization 
10 computations on the original high-precision (i.e., precision Pi) sample color components 
R s , B s and G s . 

The set of dither values D K , K=0, 1, N G c-l, may be configured to have an 
average value of V% (or approximately Yz). In some embodiments, the set of dither values 
may approximate a uniform distribution of numbers between Vi-A and X A+ A, where A is a 

1 5 rational number greater than or equal to one. The dither radius A may be programmable. 
In one particular embodiment, A equals one. In the dithering summations (6), the least 
significant bit position of the sample color values Rs, G s and B s which is to survive 
truncation is aligned with the ones bit position of the dither value operand. Equivalently, 
the most significant bit position of the sample color values which gets discarded in the 

20 truncation is aligned with the Vi bit position of the dither value operand. 

Tabulated Example of Spatial Dithering 

Figure 8 presents a tabulated example of the beneficial effects of spatial dithering 
the red color channel in the case where a series of 16 filtering units are configured to 
25 perform a box filtering on the samples in each bin to determine video pixels. The 
tabulated example focuses on the red color channel, but it is to be understood that spatial 
dithering may be applied to any or all of the color channels and non-color channels such 
as alpha. 

Host software may perform a series of register writes to set array parameters 
30 X St art(K)=l/2, Y St art(K)=l/2 and AX(K)=AY(K)=1, for K=0, 1, 2, 15. Furthermore, 
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host software may set a filter mode control register in each filter unit to an appropriate 

value to turn on box filtering. 

Each graphics card GC(K) computes a red color value R K for a sample S K at a 

sample position Q K in a given bin as suggested by Figure 2. An exemplary set of 
5 computed red values R K are given in the second column of Figure 8. Graphics card 

GC(K) adds dither value D K to the red value Rk to obtain dithered red value R K -Rk+D k . 

An exemplary set of dither values Dk are shown in the third column. The dither values 

Die are chosen to have an average value of X A approximately. 

In the process of buffering and forwarding the dithered red value R K ' to filtering 
10 unit FU(K), the fractional part of the dithered red value Rk is discarded leaving only the 

integer part as the truncated dithered value Trn(Ric'). The filtering units collaboratively 

accumulate a summation SiDof the truncated dithered values Trn(RK'), K=0, 1, 2, 15. 

The summation Std equals 147. This departs by only a small amount (i.e., 1/8) from the 

summation Sred of the originally computed red values Rk, K=0, 1, 2, 15; Sred equals 
15 147.125. In contrast, the summation Srnd of the rounded red values Trn(Ric+l/2), K=0, 

1, 2, . . ., 15, is equal to 144, much further from "truth", i.e., the summation Sred- 

Thus, spatial dithering may allow the averages of the color values (or, more 

generally, sample component values) to more be more faithfully preserved through 

truncation than simple rounding without spatial dithering. It is noted that the advantages 
20 of dithering may be more pronounced when the original sample color values are tightly 

clustered about their average value. 

The above tabulated example assumes box filtering of the samples with virtual 

pixel centers constrained to the centers of sample bins by setting Xs ta rt(K)=l/2, 

Y St art(K)=l/2 and AX(K)=AY(K)=1, for K=0, 1, 2, 15. However, it is noted that 
25 spatial dithering may be applied more generally with any of various types of filtering. For 

example, spatial dithering may be applied with any of a wide variety of radially 

symmetric filters obtainable by programming the filtering coefficient tables in the filter 

units. In addition, spatial dithering may be applied with box filtering and arbitrary values 

of Xstart(K), Ystart(K), AX(K) and AY(K). 
30 For additional disclosure on the subject of spatial dithering, please refer to U.S. 

Patent Application No. 09/760,512, filed on January 11, 2001, entitled "Recovering 
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Added Precision from L-Bit Samples by Dithering the Samples Prior to an Averaging 
Computation", invented by Nathaniel David Naegle. This patent application is hereby 
incorporated by reference in its entirety. 



5 Two Render Processors per Graphics Card 

Some currently available OEM graphics cards are equipped with two graphics 
processors and two corresponding video data ports (e.g., DVI ports). Thus, in one set of 
embodiments, a series of such dual-processor graphics cards DC(0), DC(1), DC(2), 
DC(Ndc-1) may be configured to feed the series of filtering units FU(0), FU(1), FU(2), 
10 . . ., FU(2Ndc~1) as suggested Figure 9 in the Ndc-4. In general, Ndc may be any positive 
integer. Host software may program the dual-processor graphics card DC(K), K=0, 1, 2, 
Ndc-1> to generate the streams H2K and H 2K +i of dithered samples corresponding to 
sample positions subsets T 2 k and T2K+1 respectively. The sample streams H 2 k and H 2 k+i 
are supplied to filtering units FU(2K) and FU(2K+1) in parallel through the two video 
15 data ports respectively. In particular, host software may program the dual-processor 
graphics card DC(K) so that its first graphics processor generates sample stream H 2 k and 
its second graphics processor generates sample stream H 2 k+i. 

Spatial Dithering Methodology 
20 In one set of embodiments, a method for generating graphical images may be 

arranged as indicated in Figure 10. In step 1000, a host computer may broadcast a stream 

of graphics primitives to a set of rendering processors. 

In step 1010, each rendering processor RP(K) of said set of rendering processors 

generates a stream of samples in response to the received graphics primitives. The 
25 samples of the generated stream correspond to sample positions of the subset T K as 

described above. 

In step 1020, each rendering processor RP(K) adds a dither value D K to a data 
component (such as red, green, blue or alpha) of each the samples in the generated stream 
to obtain dithered data components. The dither values may have an average value of l A 
30 and a dither radius greater than or equal to one. 
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In step 1030, each rendering processor RP(K) buffers the dithered data 
components in an internal frame buffer, and forwards a truncated version of the dithered 
data components to a corresponding filtering unit. The filtering units may be coupled 
together in a series. 

5 In step 1040, the filtering units perform a weighted averaging computation in a 

pipelined fashion on the truncated dithered data components to determine pixel data 
components. The data components are usable to determine at least a portion of a 
displayable image. 

Numerous variations and modifications will become apparent to those skilled in 
10 the art once the above disclosure is fully appreciated. It is intended that the following 
claims be interpreted to embrace all such variations and modifications. 
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